Search CORE

3 research outputs found

An improved Arabic text classification method using word embedding

Author: Bahassine Said
El Beggar Omar
Kissi Mohamed
Sabri Tarik
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Feature selection (FS) is a widely used method for removing redundant or irrelevant features to improve classification accuracy and decrease the model’s computational cost. In this paper, we present an improved method (referred to hereafter as RARF) for Arabic text classification (ATC) that employs the term frequency-inverse document frequency (TF-IDF) and Word2Vec embedding technique to identify words that have a particular semantic relationship. In addition, we have compared our method with four benchmark FS methods namely principal component analysis (PCA), linear discriminant analysis (LDA), chi-square, and mutual information (MI). Support vector machine (SVM), k-nearest neighbors (K-NN), and naive Bayes (NB) are three machine learning based algorithms used in this work. Two different Arabic datasets are utilized to perform a comparative analysis of these algorithms. This paper also evaluates the efficiency of our method for ATC on the basis of performance metrics viz accuracy, precision, recall, and F-measure. Results revealed that the highest accuracy achieved for the SVM classifier applied to the Khaleej-2004 Arabic dataset with 94.75%, while the same classifier recorded an accuracy of 94.01% for the Watan-2004 Arabic dataset

Institute of Advanced Engineering and Science

Text classification supervised algorithms with term frequency inverse document frequency and global vectors for word representation: a comparative study

Author: Bahassine Said
Benabbes Khalid
Hamou Aadi Fatima Zahrae Ait
Housni Khalid
Labd Zakia
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/02/2024
Field of study

Over the course of the previous two decades, there has been a rise in the quantity of text documents stored digitally. The ability to organize and categorize those documents in an automated mechanism, is known as text categorization which is used to classify them into a set of predefined categories so they may be preserved and sorted more efficiently. Identifying appropriate structures, architectures, and methods for text classification presents a challenge for researchers. This is due to the significant impact this concept has on content management, contextual search, opinion mining, product review analysis, spam filtering, and text sentiment mining. This study analyzes the generic categorization strategy and examines supervised machine learning approaches and their ability to comprehend complex models and nonlinear data interactions. Among these methods are k-nearest neighbors (KNN), support vector machine (SVM), and ensemble learning algorithms employing various evaluation techniques. Thereafter, an evaluation is conducted on the constraints of every technique and how they can be applied to real-life situations

Institute of Advanced Engineering and Science

The Analysis of Attribution Reduction of K-Nearest Neighbor

Author: Alpaydin
Arinal Ihsan M.
Bahassine Said
Ejazuddin Syed Muhammad
Han J.
Hand D.
Indriyanto Jatmiko
Listiowarni Indah
Liu
Miles
Muhammad Danil
Novakovi´c J
Novakovi´c J.
Rachburee Nachirat
Rahmat Widia Sembiring
Refaeilzadeh P.
Syahril Efendi
Wibowo Haryanto Ardy
Witten
Wu Runxiu
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref